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Abstract 

The neural-network interatomic potential for crystalline and liquid Si has been developed using 
the forward stepwise regression technique to reduce the number of bases with keeping the accuracy 
of the potential. This approach of making the neural-network potential enables us to construct 
the accurate interatomic potentials with less and important bases selected systematically and less 
heuristically. The evaluation of bulk crystalline properties, and dynamic properties of liquid Si 
show good agreements between the neural-network potential and ab-initio results. 
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I. INTRODUCTION 


Molecular dynamics (MD) is a powerful simulation tool for atomistic- to nano-scale dy¬ 
namic phenomena and nowadays widely used in a variety of research field such as crys¬ 
tal growths,- defect formations,- 1 ^ ion bombardment,- radiation damages,- ion conduction, - 
thermal conduction,- and chemical reactions /' 8 The validity of those simulation results de¬ 
pends strongly on the interatomic potentials used. An ab-initio MD is an extreme example 
of the accurate interatomic potential, however, since it costs significantly more than the 
empirical potentials, it is not suitable for large-scale or long-time simulation. Thus a lot of 
empirical interatomic potentials have been proposed since the MD was invented. However, 
most empirical potentials developed so far reproduce limited physical quantities of a certain 
situation, since they have only a few basis functions such as one or two two-body terms, a 
three-body term, and a functional terrn.- 

Under this circumstance, some studies of making more accurate interatomic potentials 
applicable for a variety of situations using many basis functions and machine-learning tech¬ 
niques have emerged such as the neural network (NN) potential,— - — the Gaussian approxi¬ 
mation potential (GAP),— and the linear regression (LR) potential.— Since these potentials 
use more basis functions than former empirical potentials that consist of one or a few func¬ 
tions, they reproduce reference values better and usually they are more accurate than the 
former potentials whereas they are much slower than the former potentials. Comparing 
the NN and LR potentials, the NN potential is expected to be more accurate than the LR 
potential in case of the same number of bases, since it has non-linear activation functions in 
the NN structure. This non-linearity is expected to work like bond-order— or environment- 
dependent potentials.— Because of this, and also because we can control the accuracy and 
computational cost by choosing the number of layers and nodes in a layer, we adopt the NN 
potential in this paper. However, it is still heuristic that what kind of and how many basis 
functions should be used for the NN, and we have to find appropriate basis functions by 
trial-and-error. 

Seko et al.— showed that the number of bases in the LR potential can be drastically 
reduced retaining its accuracy by using the least absolute shrinkage and selection operator 
(LASSO) technique. The LASSO adds a penalty term to the function to be minimized 
and enables us to select important bases by eliminating other bases. This technique is very 
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powerful to reduce and select bases relevant to the system considered, however, to the best 
of our knowledge, such a basis selection has not been employed in the NN potential. We 
adopt a similar sparsiheation technique, called the forward stepwise (FS) regression,— which 
can usually sparsify more aggressively or reduce more bases than the LASSO. With using 
the FS regression, important bases are selected and the number of bases in NN potential is 
reduced systematically and less heuristically. 

In this paper, we create a NN potential for crystalline and liquid Si system, since in spite 
of a lot of efforts to make accurate and general-purpose interatomic potential for Si system, 
there is no sufficient one that can be used over the wide range of situation and well reproduces 
physical properties of crystalline structures, amorphous structures, and dynamic properties 
of liquid. For example, the Stillinger-Weber (SW) potential does not well reproduce some 
physical properties of a crystalline structure and needs further refinement of parameters,— 1 ^ 
and the power spectra of velocity auto-correlation calculated using SW potential and Tersoff 
potential 26,2 - do not agree with that of an ab-initio calculation. 28,29 This paper compares 
some physical properties of a variety of situations obtained with the NN potential to those 
obtained with other potential and the density-functional theory (DFT) calculation in the 
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II. COMPUTATION DETAILS 


We adopt the 1-hidden-layer NN model originally developed by Behlcr and Parrinellc3 
as shown in Fig. [H The energy of an atom-?' in a structure-s is defined as the following 
equations, 


Vi-m 



( 1 ) 

( 2 ) 


Here w l mn is the weight of the line from the node-r? in (l — l)-th layer to the node-m in 
l -tli layer, y\ m is the value of node-m in Z-tli layer, and G* y ^ c is the n-tli symmetry function 
of a certain type, which depends on interatomic bond distances from the atom-? or angles 
between bonds around atom-?. Superscripts on indicate the type of symmetry function 
that can be 2-body or 3-body function. The activation function cr(x) is defined using the 
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sigmoid function as, 
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Here we subtract 1/2 from the sigmoid function since the activation function is required to 
be zero when inputs are zero. 

The symmetry functions include two types for 2-body functions, Gaussian and cosine, 
and one 3-body angular function: 
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where r] n , £ n , A n are the parameters, is the interatomic distance between i and j, 9ijk is 
the angle between bonds i-j and i-k , and the cutoff function / c (r) is defined as 


fc(r) = 
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, for r < r c , 
for r > r c . 

Here r c is the cutoff radius. As shown in Fig. |T] each function type has some different 
parameters, for example, in case of the Gaussian type, there are NQ auss parameters {rj n }. 

The weights in the NN potential {w} are optimized through minimizing the following 
function, 
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where E S,F>FT and E S,NN = JT F/ S ’ NN is the cohesive energies of a structure-s obtained via 
DFT and NN-potential calculations; F/^ )FT and F ,are the cc-components of forces on 
atom-?' in the structure-s obtained via DFT and NN-potential calculations. N is the number 
of atoms in a structure-s, and N s is the number of samples considered. The parameter k is 
the conversion factor for forces that absorbs the difference of units between energy (eV) and 
force (eV/A); it is set 0.05, which makes the force value 0.2 eV/A correspond to the energy 
value 0.01 eV. The force term is important if the potential is applied to some dynamics 
simulation.— 
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FIG. 1: The structure of 1-hidden-layer NN model for Si system. 

In order to select relevant and small number of bases, we adopt the FS regression 
approach.— The FS regression in this study proceeds as follows: 

1. Weights on lines connecting bases and 1st layer nodes {tc 1 } are set zero and other 
weights are set non-zero. Hence the weights to be optimized, {w} 0 , include only {u> 2 } 
at this point. 

2. Compute the sum of squared gradients associated to {w 1 }, 



and find the largest one g“ iax . 

3. Append the weights, {ru^ nemax }, which is related to the basis just selected in the 
previous step, to the weight set to be optimized, {w;}o. 

4. Perform the minimization with respect to the current weight set {ta} 0 until the con¬ 
vergence criterion, \C m — £" l-1 |/|£ m ~ 1 | < £ to i> is achieved or the iteration m exceeds 
the maximum iteration number M, where C m is the C value at m-th iteration. 

5. Go back to 2 unless all the bases are selected. 
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The number of iterations M in the minimization at the step 4 should be large enough 
to find the minimum or at least the basin of the minimum in the {tt;} 0 -space. However, 
since large M results in a lot of computation time for the iteration of the outer loop, M 
or the convergence criterion for the minimization is usually set moderate value determined 
empirically. Since the FS regression searches the minimum near the origin of parameter 
space {w 1 } starting from the origin = 0, over-fitting can be suppressed and a sparse 
NN model can be obtained as same as the LASSO. The FS regression has an advantage 
against the LASSO with regard to the computational cost of minimization, since the FS 
does not need to evaluate gradients of all weights but only the weights to be optimized 
{tc} 0 during the inner minimization loop, whereas the LASSO needs the gradients of all the 
weights. We can select the important bases that appears earlier than the other ones. 

The sample structures prepared in this work were chosen so that the NN potential can 
well reproduce the crystalline and amorphous Si structures, some defect formation energies, 
and dynamical properties of liquid Si The samples include: perfect crystalline Si of some 
different crystal structure, such as diamond, fee, bcc, and hep; crystalline structure which 
involves a Si vacancy; generalized stacking faults on (111) plane of diamond structure; 
and uniform deformation, shear deformation, and random atom displacements of above 
mentioned structures; and liquid Si structures extracted from ab-initio MD simulation runs. 
We prepare 3930 samples that include 3930 energies and 90096 force components. The 
samples are divided into training-set data and test-set data, and the training-set data are 
used to optimize the NN and the test-set data are used to evaluate the NN potential. In 
this paper, randomly selected 10 % of all the samples are chosen as test samples. 

To obtain the reference values, we adopt the DFT calculation using vasp (Vienna Ab- 
initio Simulation Package).— 1 ^ The generalized gradient approximation (GGA) functional^ 
for the exchange-correlation potential and the projector augmented wave (PAW) method^ 
for the pseudo-potential of Si are adopted. The energy cutoff of expansion of plane waves 
is 250 eV. Phonon dispersion relations are calculated using phonopy.— Molecular statics 
and MD calculations of liquid Si with NN potential are performed using our own code, NAP 
(Nagoya Atomistic simulation Package).— 
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FIG. 2: RMSEs of energy and force of test set as functions of number of bases extracted through 
the FS regression. 


III. RESULTS AND DISCUSSION 

Relevant bases for crystalline and liquid Si system are selected using the FS regression as 
follows. The cutoff radius r c and the number of nodes in a hidden layer are set 4.0 A and 
10, respectively, according to the previous studies^ 1 ^ and preliminary calculations. We 
prepare 300 bases including Gaussian, cosine, and angular types as (Ncauss, N cos , N ang ) = 
(100,100,100). The number of iterations in the inner minimization, M, at the step 4 of the 
FS regression is 200. During the FS regression, the root mean squared errors (RMSEs) of 
energy and force for test set decrease as the number of bases increases as shown in Fig. [21 
Eighteen bases are chosen since the RMSE of energy is less than 10 meV/atom. The selected 
bases include four Gaussian bases with { 77 } = {0.1796, 0.4184, 1.0551, 1.9306}[A 2 ], eleven 
cosine bases with {£} = {0.7854, 0.8816, 1.1701, 1.4105, 1.7471, 2.4684, 2.6607, 2.8050, 
2.8531, 2.9493, 3.1416}[A _1 ], and three angular bases with {A} = { 0 . 0 , 0.6327, 1 . 0 }. Then, 
the weights {u>} of these 18 bases are fully optimized within the space of selected bases. Final 
values of RMSE of energy and force obtained via the minimization are 3.9 meV/atom and 
173 meV/A, respectively. Figure [3] shows the relation between energies of samples obtained 
using NN potential and DFT. We can see that the NN potential with reduced number of 
bases well reproduces the DFT energies over a wide variety of structures. 

It is worth looking into the NN structure in detail to understand how the NN potential 
works with selected bases. There are three angular bases selected through the FS regression. 
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FIG. 3: Relation between NN energies and DFT energies of sample data set. Energies about 
training set (filled circle) and test set (open triangle) are mostly on the ideal line. 


However, the A = 1/3, which corresponds to the bond angle of diamond Si and is adopted 
in the SW potential, is not included in the selected bases. The angular basis with A = 1/3 
can be expanded with bases using A = 0, 2/3, and 1, as 

^cosd + 0 = i(cosd + 0) 2 + ^cosd + ^ - ^(cosd + l) 2 . (10) 

Thus one may think that the selected three bases of {A} = {0.0, 0.6327,1.0} are just a 
reducible form of the basis with A = 1/3. However, seeing Fig. SI which shows the weights 
from the three angular bases in the NN structure, the three bases contribute to nodes in 
the hidden layer in many ways, not like Eq. oa. Moreover, the nodes in the hidden layer 
receive contributions not only from angular bases but also from two-body bases, and the 
energy of an atom-i is obtained from a weighted sum of these node values as in Eq. ©• 
Therefore, it is difficult to know which bases are important and to determine the number of 
bases heuristically. 

Elastic properties of Si diamond crystal obtained using the DFT, present NN potential, 
and SW potential are listed in Table [0 The NN potential reproduces elastic properties mostly 
within 10%; the lattice constant ao, bulk modulus B, and a component of elastic constants 
Cn are in good agreement. However, C \2 and C 44 differ slightly from the reference values, 
since the corresponding deformation modes for them are not included in the sample set. If we 
add some structures relevant to deformation modes to the training set, the potential could 




Node index 

FIG. 4: Weights from angular terms to nodes in the hidden layer, {iCm neang }. White, gray, and 
black bins indicate the weights from the angular bases with A = 0.0, 0.6327, and 1.0, respectively. 


TABLE I: Elastic properties of diamond Si, such as lattice constant clq, cohesive energy E co h, bulk 
modulus B, elastic constants C\ i, C 12 , and C44 calculated using DFT, SW potential, and NN 
potential. 



DFT 

SW 

NN 

a 0 (A) 

5.474 

5.431 

5.476 

E co h (eV/atom) 

-4.541 

-4.336 

-4.537 

B (GPa) 

88.31 

101.38 

84.08 

Dn (GPa) 

168.0 

149.8 

165.1 

C V2 (GPa) 

77.4 

74.8 

64.4 

C 44 (GPa) 

95.4 

109.7 

86.3 


be improved. Figure [5] shows the phonon dispersion relations of the NN potential, DFT, 
and SW potential. The NN potential reproduces the phonon dispersion of DFT mostly 
better than SW potential. However, transversal acoustic branches of NN at the Brillouin 
zone edge are not good agreement with those of DFT. This can also be attributed to the 
lack of relevant structures in sample data set, and the NN potential cannot well extrapolate 
energies or forces of such deformation modes. 

The static and dynamic properties of liquid Si are also examined and compared between 
the NN, DFT, and SW, since it is known that the SW and Tersoff potentials do not well 
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FIG. 5: Phonon dispersion relations of DFT, SW, and NN. NN potential reproduces DFT phonon 
dispersion better than SW potential. 


reproduce the dynamic properties of liquid Si obtained by the DFT.— ^ Liquid structures 
and properties are obtained through MD simulation runs at 2,000K with about 7% larger 
density than that of the equilibrium diamond crystal at OK.— Figures El a), Efb), andE^c) 
show the radial distribution functions g(r), velocity autocorrelation functions Z(t), and 
their power spectra Z(oj), respectively. In all cases, the NN potential reproduces DFT 
results well, whereas the SW potential does not especially in Z(t) and Z{oj). Especially, 
on the self diffusion, which can be estimated from the Z(uj = 0), the NN potential well 
reproduces DFT result much better than the SW. This dynamic property of liquid Si is not 
well reproduced if the force data are not included in the function to be minimized unlike in 
Eq. dSJ) . Hence the force-matching is necessary when we create potentials for the purpose of 
dynamics simulation. 

IV. CONCLUSION 

We have constructed the neural-network (NN) interatomic potential for Si system with 18 
bases selected from original 300 bases through the forward stepwise (FS) regression. The FS 
regression technique, which is similar to the sparsification by the least absolute shrinkage and 
selection operator (LASSO), allows us to extract important bases for the system considered. 
This approach can be straightforwardly applicable to any compound systems and enable us 
to create accurate and fast NN potentials. The present NN potential shows good agreements 
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FIG. 6: (a) Radial distribution function g(r), (b) velocity autocorrelation function Z(t), and (c) 


its power spectrum Z(u), of liquid-Si at 2,000 K. NN well reproduces DFT results compared to 
SW. 


with DFT results such as elastic properties and phonon dispersion relation of crystalline Si, 
and static and dynamic properties of liquid Si. Thus it can be used for a variety of situations 
such as melting, amorphization, crystallization, and mechanical deformation of Si crystal. 
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